We used data from the 2016 ACS for Puerto Rico to examine wage gaps between individuals with different education levels. Our research questions are: 1) How do earnings vary by education level? 2) How does the premium for education vary by gender? The 2016 ACS is a nationally representative sample of 5194. The household survey includes questions pertaining to each household member’s demographic characteristics and labor market activity.
We restrict our sample to these three racial groups: White, Black and Other. In addition, given our goal of examining earning differences by gender and marital status and the reporting of earnings in the ACS on an annual basis (wages, salary, commissions, bonuses, tips, and self-employment income during the past 12 months), we restrict our sample to full-time year-round (FTYR) workers. We define FTYR workers as individuals who report positive earnings over the past year, who worked at least 40 of the past 52 weeks, and who worked at least 35 hours per week in a usual work week over this period.
For our exploratory analysis we looked at population breakdowns by education, age, marital status, gender, race, earnings, and work hours. We applied filters on education (HS diploma or above), age (18-64), and work hours (>35/week).
An earnings histogram identified a default maximum amount of earnings (189k) which we also filtered out of the data. The earning distribution is progressive above the median, but drops off sharply below the median, likely indicating the presence of a minimum wage. The correlation between age and earnings is very weak (.23). Likewise, earnings is very weakly correlated with hours worked among those who work more than 35 hours per week. However, white individuals appear to have an earnings premium over other races, and both married and divorced individuals appear to have an earnings premium over those who have never been married. Given that the correlation between age and earnings was weak, this may be due to other qualitative factors possessed by those who get married. Married was recategoried to married and not married.
Earnings appear positively correlated with how well people speak English, as well as with higher levels of education. It should be noted that, on the island, fluency in the English language is neither a requirement, nor really needed. As a Latin American destination, the predominant language spoken and used (e.g. street signs, day-to-day communications) is Spanish, with English only coming into use in the tourism industry, or in those industies or companies imported from the mainland. This may help to explain the earning difference, as those are likely to pay more than local businesses (note that tourism is the largest industry on the island and the source of most of the island’s GDP). Men also appear to earn a small premium over women.
The age distribution of full time workers is skewed towards older adults, possibly indicating that younger workers have trouble finding full-time work, wait to enter the workforce, or are leaving the territory.
Examine the first 10 or 20 observations (rows of data) corresponding to variables of interest (columns) and compare the observed values to the data dictionary for person records.
| Earnings | Sex | Age | Race | Marital Status | Education | Work Week | Work Hours |
|---|---|---|---|---|---|---|---|
| 34000 | Male | 47 | White | Married | Associate’s degree | 50 to 52 | 40 |
| 13000 | Male | 58 | Black or African American | Never married | High school diploma | 50 to 52 | 40 |
| 18000 | Male | 50 | White | Married | Master’s degree | 50 to 52 | 40 |
| 10300 | Female | 39 | White | Married | Bachelor’s degree | 50 to 52 | 40 |
| 28600 | Female | 39 | Black or African American | Married | Bachelor’s degree | 50 to 52 | 45 |
| 24800 | Male | 37 | Black or African American | Married | Bachelor’s degree | 50 to 52 | 46 |
| 22000 | Female | 47 | Some Other | Never married | Associate’s degree | 50 to 52 | 40 |
| 19000 | Female | 60 | Black or African American | Never married | High school diploma | 50 to 52 | 40 |
| 87000 | Female | 58 | White | Divorced | Associate’s degree | 50 to 52 | 40 |
| 22900 | Male | 61 | White | Divorced | High school diploma | 50 to 52 | 40 |
| 19000 | Male | 39 | Black or African American | Married | Associate’s degree | 50 to 52 | 38 |
| 19600 | Female | 36 | Black or African American | Married | Bachelor’s degree | 50 to 52 | 40 |
| 48000 | Male | 30 | Two or More Races | Divorced | Some college | 50 to 52 | 40 |
| 40000 | Female | 30 | White | Never married | Some college | 50 to 52 | 40 |
| 15600 | Female | 41 | White | Never married | High school diploma | 50 to 52 | 40 |
| 12100 | Male | 46 | White | Divorced | High school diploma | 50 to 52 | 40 |
| 14000 | Male | 53 | White | Married | High school diploma | 50 to 52 | 40 |
| 80000 | Male | 38 | White | Never married | Bachelor’s degree | 50 to 52 | 40 |
| 15100 | Female | 26 | Some Other | Never married | High school diploma | 50 to 52 | 40 |
| 84000 | Female | 60 | White | Married | Doctorate degree | 50 to 52 | 40 |
Compute and examine descriptive statistics including the minimum, maximum, mean, and median for quantitative variables of interest
| ss16ppr (N = 5,194) | |
|---|---|
| Minimum | 10000.00 |
| Maximum | 125000.00 |
| Median | 24000.00 |
| Mean | 29278.84 |
| ss16ppr (N = 5,194) | |
|---|---|
| Minimum | 18.00 |
| Maximum | 64.00 |
| Median | 43.00 |
| Mean | 29278.84 |
| ss16ppr (N = 5,194) | |
|---|---|
| Minimum | 35.00000 |
| Maximum | 99.00000 |
| Median | 40.00000 |
| Mean | 41.21544 |
In Puerto Rico, the majority of people identify themselves as white. Minority races including American Indian, Alaska Native, Asian, Native Hawaiian and Other Pacific Islander can be eliminated.
| RACWHT | Count |
|---|---|
| No | 1401 |
| Yes | 3793 |
| RACBLK | Count |
|---|---|
| No | 4417 |
| Yes | 777 |
| RACOTHER | Count |
|---|---|
| No | 4381 |
| Yes | 813 |
| MAR | Count |
|---|---|
| Married | 2580 |
| Widowed | 65 |
| Divorced | 944 |
| Separated | 105 |
| Never married | 1500 |
| MAR1 | Count |
|---|---|
| No | 2614 |
| Yes | 2580 |
| MAR2 | Count |
|---|---|
| Married | 2645 |
| Divorced | 1049 |
| Never married | 1500 |
| SCHL | Count |
|---|---|
| High school diploma | 1148 |
| Some college | 791 |
| Associate’s degree | 818 |
| Bachelor’s degree | 1726 |
| Master’s degree | 505 |
| Professional degree | 113 |
| Doctorate degree | 93 |
| SEX | Count |
|---|---|
| Male | 2627 |
| Female | 2567 |
Generate and examine histograms for quantitative variables of interest
Generate and examine bar charts/graphs for qualitative variables of interest
Gender is nearly equalized in Puerto Rico
Task 5:
Generate and examine cross tabulations, scatterplots, and/or correlation coefficients of interest
The correlation of 0.22 for age and earnings indicates a very weak relationship. Age is neither a primary reason for differences in earnings, nor a clear proxy for some other variable.
The correlation of 0.18 for earnings and work hours is also very weak. No doubt it would be strong if the data were not filtered to those working more than 35 hours per week. Interestingly, earnings appear to drop for those working more than 60 hours per week.
| RACWHT: No (N = 1,401) | RACWHT: Yes (N = 3,792) | |
|---|---|---|
| Minimum | 10000 | 10000 |
| Maximum | 96000 | 100000 |
| Median | 23000 | 24600 |
| Mean | 26946.56 | 30115.28 |
| RACBLK: No (N = 4,417) | RACBLK: Yes (N = 776) | |
|---|---|---|
| Minimum | 10000 | 10000 |
| Maximum | 98000 | 100000 |
| Median | 24000 | 23000 |
| Mean | 29550.70 | 27608.03 |
| RACOTHER: No (N = 4,380) | RACOTHER: Yes (N = 813) | |
|---|---|---|
| Minimum | 10000 | 10000 |
| Maximum | 100000 | 96000 |
| Median | 24000 | 23600 |
| Mean | 29638.67 | 27222.51 |
| MAR1: No (N = 2,614) | MAR1: Yes (N = 2,579) | |
|---|---|---|
| Minimum | 10000 | 10000 |
| Maximum | 100000 | 98000 |
| Median | 22000 | 25600 |
| Mean | 27036.95 | 31514.04 |
| SCHL: High school diploma (N = 1,147) | SCHL: Some college (N = 791) | SCHL: Associate’s degree (N = 818) | SCHL: Bachelor’s degree (N = 1,726) | SCHL: Master’s degree (N = 505) | SCHL: Professional degree (N = 113) | SCHL: Doctorate degree (N = 93) | |
|---|---|---|---|---|---|---|---|
| Minimum | 10000 | 10000 | 10000 | 10000 | 10000 | 10400 | 17900 |
| Maximum | 90000 | 93000 | 90000 | 98000 | 98000 | 100000 | 96000 |
| Median | 18000 | 20000 | 20950 | 29200 | 35000 | 46000 | 60000 |
| Mean | 21948.24 | 24662.63 | 25150.86 | 32669.99 | 38262.18 | 49146.90 | 58373.12 |
\[Earning = \beta_0 + Divorced * \beta_1 + NeverMarried * \beta_2 + Female * \beta_3 + RaceBlack * \beta_4 + RaceOther * \beta_5 + SomeCollege * \beta_6 + Associate * \beta_7 + Bachelor * \beta_8 + Master * \beta_9 + Professional * \beta_10 + Doctoral * \beta_11 + Age * \beta_12\]
##
## Call:
## lm(formula = PERNP ~ Divorced + NeverMarried + Female + RaceBlack +
## RaceOther + SomeCollege + Associate + Bachelor + Master +
## Professional + Doctoral + AGEP, data = ss16ppr)
##
## Residuals:
## Min 1Q Median 3Q Max
## -43686 -9198 -2921 5348 63456
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12519.82 1104.43 11.336 < 0.0000000000000002 ***
## Divorced -1146.91 546.10 -2.100 0.035760 *
## NeverMarried -2959.79 517.30 -5.722 0.000000011144 ***
## Female -4824.55 429.48 -11.234 < 0.0000000000000002 ***
## RaceBlack -1301.52 589.11 -2.209 0.027196 *
## RaceOther -2132.00 577.36 -3.693 0.000224 ***
## SomeCollege 4311.57 691.75 6.233 0.000000000494 ***
## Associate 4232.24 685.40 6.175 0.000000000713 ***
## Bachelor 12406.91 584.22 21.237 < 0.0000000000000002 ***
## Master 17855.61 808.79 22.077 < 0.0000000000000002 ***
## Professional 28203.02 1469.19 19.196 < 0.0000000000000002 ***
## Doctoral 35732.87 1609.22 22.205 < 0.0000000000000002 ***
## AGEP 285.13 20.95 13.609 < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14860 on 5180 degrees of freedom
## Multiple R-squared: 0.2484, Adjusted R-squared: 0.2467
## F-statistic: 142.7 on 12 and 5180 DF, p-value: < 0.00000000000000022
\[Earning = \beta_0 + Female * \beta_1 + SomeCollege * \beta_2 + Associate * \beta_3 + Bachelor * \beta_4 + Master * \beta_5 + Professional * \beta_6 + Doctoral * \beta_7\]
##
## Call:
## lm(formula = PERNP ~ Female + SomeCollege + Associate + Bachelor +
## Master + Professional + Doctoral, data = ss16ppr)
##
## Residuals:
## Min 1Q Median 3Q Max
## -38660 -9524 -3508 5802 67102
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 23324.4 471.0 49.521 < 0.0000000000000002 ***
## Female -4656.1 440.7 -10.566 < 0.0000000000000002 ***
## SomeCollege 3339.6 711.0 4.697 0.000002704 ***
## Associate 4000.8 705.6 5.670 0.000000015 ***
## Bachelor 12229.4 601.2 20.343 < 0.0000000000000002 ***
## Master 17934.3 832.9 21.532 < 0.0000000000000002 ***
## Professional 28336.0 1515.3 18.700 < 0.0000000000000002 ***
## Doctoral 37602.1 1656.5 22.699 < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15330 on 5185 degrees of freedom
## Multiple R-squared: 0.1989, Adjusted R-squared: 0.1979
## F-statistic: 184 on 7 and 5185 DF, p-value: < 0.00000000000000022